智能论文笔记

Interactive Model with Structural Loss for Language-based Abductive Reasoning

Linhao Li , Ming Xu , Yongfeng Dong , Xin Li , Ao Wang , Qinghua Hu

分类：自然语言处理 | 人工智能

2021-12-01

建议绑架自然语言推理任务（$ \ alpha $ NLI）以推断出原因与事件之间的最合理的解释。在$ \ Alpha $ NLI任务中，给出了两个观察，并要求最合理的假设从候选人中挑出。现有方法将每个候选假说之间的关系进行分别统一地惩罚推理网络。在本文中，我们认为不必区分正确假设之间的推理能力;同样，在解释观察的原因时，所有错误的假设都会有所贡献。因此，我们建议小组而不是排名假设和设计本文中称为“联合软制焦点”的结构损失。基于观察，假设通常与语义相关，我们设计了一种新颖的互动语言模型，旨在利用竞争假设之间丰富的互动。我们为$ \ alpha $ nli命名这个新型号：具有结构丢失（IMSL）的交互式模型。实验结果表明，我们的IMSL已经在罗伯塔大型预磨削模型上实现了最高性能，ACC和AUC结果分别增加了约1 \％和5 \％。

translated by 谷歌翻译

Reducing Domain Gap in Frequency and Spatial domain for Cross-modality Domain Adaptation on Medical Image Segmentation

Shaolei Liu , Siqi Yin , Linhao Qu , Manning Wang

分类：计算机视觉

2022-11-28

Unsupervised domain adaptation (UDA) aims to learn a model trained on source domain and performs well on unlabeled target domain. In medical image segmentation field, most existing UDA methods depend on adversarial learning to address the domain gap between different image modalities, which is ineffective due to its complicated training process. In this paper, we propose a simple yet effective UDA method based on frequency and spatial domain transfer uner multi-teacher distillation framework. In the frequency domain, we first introduce non-subsampled contourlet transform for identifying domain-invariant and domain-variant frequency components (DIFs and DVFs), and then keep the DIFs unchanged while replacing the DVFs of the source domain images with that of the target domain images to narrow the domain gap. In the spatial domain, we propose a batch momentum update-based histogram matching strategy to reduce the domain-variant image style bias. Experiments on two cross-modality medical image segmentation datasets (cardiac, abdominal) show that our proposed method achieves superior performance compared to state-of-the-art methods.

translated by 谷歌翻译

Towards Label-efficient Automatic Diagnosis and Analysis: A Comprehensive Survey of Advanced Deep Learning-based Weakly-supervised, Semi-supervised and Self-supervised Techniques in Histopathological Image Analysis

Linhao Qu , Siyu Liu , Xiaoyu Liu , Manning Wang , Zhijian Song

分类：计算机视觉

2022-08-18

组织病理学图像包含丰富的表型信息和病理模式，这是疾病诊断的黄金标准，对于预测患者预后和治疗结果至关重要。近年来，在临床实践中迫切需要针对组织病理学图像的计算机自动化分析技术，而卷积神经网络代表的深度学习方法已逐渐成为数字病理领域的主流。但是，在该领域获得大量细粒的注释数据是一项非常昂贵且艰巨的任务，这阻碍了基于大量注释数据的传统监督算法的进一步开发。最新的研究开始从传统的监督范式中解放出来，最有代表性的研究是基于弱注释，基于有限的注释的半监督学习范式以及基于自我监督的学习范式的弱监督学习范式的研究图像表示学习。这些新方法引发了针对注释效率的新自动病理图像诊断和分析。通过对130篇论文的调查，我们对从技术和方法论的角度来看，对计算病理学领域中有关弱监督学习，半监督学习以及自我监督学习的最新研究进行了全面的系统综述。最后，我们提出了这些技术的关键挑战和未来趋势。

translated by 谷歌翻译

GSim: A Graph Neural Network based Relevance Measure for Heterogeneous Graphs

Linhao Luo , Yixiang Fang , Moli Lu , Xin Cao , Xiaofeng Zhang , Wenjie Zhang

分类：机器学习

2022-08-12

包含多种类型的节点和边缘的异质图在各种领域都普遍存在，包括书目网络，社交媒体和知识图。作为分析异质图的基本任务，相关度量旨在计算不同类型的两个对象之间的相关性，这些对象已在许多应用程序中使用，例如Web搜索，建议和社区检测。大多数现有的相关性措施都集中在对象具有相同类型的均质网络上，并为异质图制定了一些措施，但它们通常需要预定义的元路径。定义有意义的元路径需要大量的领域知识，这在很大程度上限制了其应用，尤其是在诸如知识图之类的图形富含模式的异质图上。最近，图形神经网络（GNN）已被广泛应用于许多图挖掘任务，但尚未用于测量相关性。为了解决上述问题，我们提出了一种基于GNN的新型相关性措施，即GSIM。具体而言，我们首先是理论上分析的，并表明GNN有效地测量图中节点的相关性。然后，我们建议基于上下文路径的图形神经网络（CP-GNN）自动利用异质图中的语义。此外，我们利用CP-GNN来支持任何类型的两个对象之间的相关性度量。广泛的实验表明，GSIM优于现有措施。

translated by 谷歌翻译

Immunofluorescence Capillary Imaging Segmentation: Cases Study

Runpeng Hou , Ziyuan Ye , Chengyu Yang , Linhao Fu , Chao Liu , Quanying Liu

分类：计算机视觉

2022-07-14

不工会是骨科诊所面临的针对技术困难和高成本拍摄骨间毛细血管面临的挑战之一。细分容器和填充毛细血管对于理解毛细血管生长遇到的障碍至关重要。但是，现有用于血管分割的数据集主要集中在人体的大血管上，缺乏标记的毛细管图像数据集极大地限制了血管分割和毛细血管填充的方法论开发和应用。在这里，我们提出了一个名为IFCIS-155的基准数据集，由155个2D毛细管图像组成，该图像具有分割边界和由生物医学专家注释的血管填充物，以及19个大型高分辨率3D 3D毛细管图像。为了获得更好的骨间毛细血管图像，我们利用最先进的免疫荧光成像技术来突出骨间毛细血管的丰富血管形态。我们进行全面的实验，以验证数据集和基准测试深度学习模型的有效性（\ eg UNET/UNET ++和修改后的UNET/UNET ++）。我们的工作提供了一个基准数据集，用于培训毛细管图像细分的深度学习模型，并为未来的毛细管研究提供了潜在的工具。 IFCIS-155数据集和代码均可在\ url {https://github.com/ncclabsustech/ifcis-55}上公开获得。

translated by 谷歌翻译

DGMIL: Distribution Guided Multiple Instance Learning for Whole Slide Image Classification

Linhao Qu , Xiaoyuan Luo , Shaolei Liu , Manning Wang , Zhijian Song

分类：计算机视觉

2022-06-17

多个实例学习（MIL）广泛用于分析组织病理学全幻灯片图像（WSIS）。但是，现有的MIL方法不会明确地对数据分配进行建模，而仅通过训练分类器来歧视行李级或实例级决策边界。在本文中，我们提出了DGMIL：一个特征分布引导为WSI分类和阳性贴剂定位的深度MIL框架。我们没有设计复杂的判别网络体系结构，而是揭示组织病理学图像数据的固有特征分布可以作为分类的非常有效的指南。我们提出了一种集群条件的特征分布建模方法和基于伪标签的迭代特征空间改进策略，以便在最终特征空间中，正面和负面实例可以轻松分离。 CamelyOn16数据集和TCGA肺癌数据集的实验表明，我们的方法为全球分类和阳性贴剂定位任务提供了新的SOTA。

translated by 谷歌翻译

TransMEF: A Transformer-Based Multi-Exposure Image Fusion Framework using Self-Supervised Multi-Task Learning

Linhao Qu , Shaolei Liu , Manning Wang , Zhijian Song

分类：计算机视觉

2021-12-02

在本文中，我们提出了一种使用自我监督的多任务学习的基于变换器的多曝光图像融合框架的传输。该框架基于编码器解码器网络，可以在大型自然图像数据集上培训，并且不需要地面真理融合图像。我们根据多曝光图像的特点设计三个自我监督的重建任务，并使用多任务学习同时进行这些任务;通过该过程，网络可以学习多曝光图像的特征并提取更多的广义特征。此外，为了补偿在基于CNN的架构中建立远程依赖性的缺陷，我们设计了一个与变压器模块相结合的编码器。这种组合使网络能够专注于本地和全局信息。我们评估了我们的方法，并将其与最新释放的多曝光图像融合基准数据集进行了11个基于竞争的传统和深入学习的方法，我们的方法在主观和客观评估中实现了最佳性能。

translated by 谷歌翻译

Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection

Junjie Yan , Yingfei Liu , Jianjian Sun , Fan Jia , Shuailin Li , Tiancai Wang , Xiangyu Zhang

分类：计算机视觉

2023-01-03

In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.

translated by 谷歌翻译

Backdoor Attacks Against Dataset Distillation

Yugeng Liu , Zheng Li , Michael Backes , Yun Shen , Yang Zhang

分类：机器学习

2023-01-03

Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.

translated by 谷歌翻译

Language Models are Drummers: Drum Composition with Natural Language Pre-Training

Li Zhang , Chris Callison-Burch

分类：自然语言处理

2023-01-03

Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method and analyze drum grooves produced by GPT3 compared to those played by human professionals, exposing the strengths and weaknesses of such generation by language-to-music transfer. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.

translated by 谷歌翻译